Wishart distribution (wishart) — Random scatter / covariance matrices#
The Wishart distribution is a probability distribution over symmetric positive-definite (SPD) matrices. It is the multivariate generalization of the chi-square distribution, and it appears whenever you sum outer products of multivariate Gaussian vectors.
A canonical generative story (integer degrees of freedom):
Draw (x_1,\dots,x_\nu \stackrel{iid}{\sim} \mathcal{N}_p(0,\Sigma))
Form the scatter matrix (W = \sum_{i=1}^\nu x_i x_i^\top = X^\top X)
Then (W \sim \mathrm{Wishart}_p(\nu,\Sigma)).
Learning goals#
Classify the distribution and understand its support/parameter space.
Write the PDF and interpret the matrix-valued CDF.
Compute/derive mean and covariance of entries, the (matrix) MGF, and entropy.
Implement NumPy-only sampling (Bartlett decomposition).
Visualize scalar marginals and Monte Carlo behavior.
Use
scipy.stats.wishart(and understand what it does not implement).
Prerequisites#
Linear algebra: eigenvalues, determinants, trace, Cholesky
Multivariate normal distribution
Basic matrix calculus intuition (for the likelihood section)
Notebook roadmap#
Title & Classification
Intuition & Motivation
Formal Definition
Moments & Properties
Parameter Interpretation
Derivations
Sampling & Simulation
Visualization
SciPy Integration
Statistical Use Cases
Pitfalls
Summary
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
import scipy
from scipy import stats
from scipy.stats import wishart
from scipy.special import multigammaln, psi
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
SEED = 7
rng = np.random.default_rng(SEED)
# Quick/slow toggle (mirrors patterns used elsewhere in this repo)
FAST_RUN = True
N_SAMPLES = 25_000 if FAST_RUN else 250_000
import sys
import plotly
print("Python:", sys.version.split()[0])
print("NumPy:", np.__version__)
print("SciPy:", scipy.__version__)
print("Plotly:", plotly.__version__)
print("Seed:", SEED)
print("FAST_RUN:", FAST_RUN, "N_SAMPLES:", N_SAMPLES)
Python: 3.12.9
NumPy: 1.26.2
SciPy: 1.15.0
Plotly: 6.5.2
Seed: 7
FAST_RUN: True N_SAMPLES: 25000
1) Title & Classification#
Name:
wishart(Wishart distribution)Type: Continuous (matrix-valued distribution)
Support: the cone of (p\times p) symmetric positive-definite matrices
[ \mathcal{S}_p^{++} = \left{W \in \mathbb{R}^{p\times p} : W=W^\top,; x^\top W x > 0;;\forall x\neq 0\right}. ]
Parameter space:
degrees of freedom: (\nu > p-1)
scale matrix: (\Sigma \in \mathcal{S}_p^{++})
Notation: [ W \sim \mathrm{Wishart}_p(\nu, \Sigma). ]
SciPy uses scipy.stats.wishart(df=nu, scale=Sigma).
Some texts require (\nu\in\mathbb{N}) (because of the Gaussian “sum of outer products” construction), but the PDF extends to real (\nu>p-1).
2) Intuition & Motivation#
2.1 What it models#
The Wishart distribution models a random SPD matrix that behaves like a noisy version of a target matrix (\Sigma). The most important example is the Gaussian scatter / covariance setting:
If (x_1,\dots,x_\nu \stackrel{iid}{\sim} \mathcal{N}p(0,\Sigma)), then [ W = \sum{i=1}^{\nu} x_i x_i^\top \sim \mathrm{Wishart}_p(\nu,\Sigma). ]
The (unbiased) sample covariance is (S = \frac{1}{\nu}W), so Wishart governs the sampling variability of covariance matrices.
2.2 Typical real-world use cases#
Multivariate statistics: sampling distribution of covariance matrices, PCA/FA uncertainty.
Hypothesis testing: tests about (\Sigma) (sphericity, equality of covariances, etc.).
Bayesian modeling: conjugate prior/posterior for the precision matrix of a multivariate normal.
Random matrix theory / simulation: generating random SPD matrices with controlled mean/scale.
2.3 Relations to other distributions#
Chi-square: for (p=1), Wishart reduces to a scaled chi-square distribution.
Gamma: (\chi^2_\nu) is a Gamma special case, so (p=1) Wishart is also (scaled) Gamma.
Inverse-Wishart: distribution of (W^{-1}); commonly used as a prior on covariance matrices.
Matrix normal: if (X) is matrix-normal, then (X^\top X) has a Wishart form.
3) Formal Definition#
Let (W\in\mathcal{S}_p^{++}), (\nu>p-1), and (\Sigma\in\mathcal{S}_p^{++}).
3.1 Multivariate gamma function#
The multivariate gamma function is [ \Gamma_p(a) = \pi^{p(p-1)/4}\prod_{j=1}^{p}\Gamma!\left(a + \frac{1-j}{2}\right). ]
SciPy provides the stable log version as scipy.special.multigammaln(a, p).
3.2 PDF#
The Wishart PDF is [ f(W\mid \nu,\Sigma) = \frac{ |W|^{(\nu-p-1)/2},\exp!\left(-\tfrac{1}{2},\mathrm{tr}(\Sigma^{-1}W)\right)} {2^{\nu p/2},|\Sigma|^{\nu/2},\Gamma_p(\nu/2)} \quad\text{for } W\in\mathcal{S}_p^{++}. ]
3.3 CDF (matrix-valued)#
A natural CDF uses the Loewner (PSD) order (\preceq): [ F(X) = \mathbb{P}(W \preceq X) = \int_{0\prec W\preceq X} f(W),dW. ]
For (p>1), this CDF generally has no simple closed form; expressions involve special functions of a matrix argument.
In practice, you often work with scalar functionals (e.g., diagonal entries, trace, determinant) whose CDFs are easier.
3.4 Scalar special case ((p=1))#
If (p=1), then (W) is a positive scalar and [ W \sim \mathrm{Wishart}1(\nu, \Sigma)\quad\Longleftrightarrow\quad \frac{W}{\Sigma} \sim \chi^2{\nu}. ] So [ F(w) = \mathbb{P}(W\le w) = F_{\chi^2_{\nu}}!\left(\frac{w}{\Sigma}\right). ]
def is_square_matrix(A: np.ndarray) -> bool:
A = np.asarray(A)
return A.ndim == 2 and A.shape[0] == A.shape[1]
def is_symmetric(A: np.ndarray, *, atol: float = 1e-10) -> bool:
A = np.asarray(A)
return is_square_matrix(A) and np.allclose(A, A.T, atol=atol)
def cholesky_spd(A: np.ndarray) -> np.ndarray:
'''Return the Cholesky factor L of an SPD matrix A = L L^T (raises if not SPD).'''
A = np.asarray(A, dtype=float)
if not is_symmetric(A):
raise ValueError("Matrix must be symmetric.")
return np.linalg.cholesky(A)
def validate_scale(scale: np.ndarray) -> np.ndarray:
scale = np.asarray(scale, dtype=float)
if not is_square_matrix(scale):
raise ValueError(f"scale must be a square 2D matrix, got shape={scale.shape}")
if not is_symmetric(scale):
raise ValueError("scale must be symmetric")
_ = cholesky_spd(scale) # raises if not SPD
return scale
def validate_df(df: float, p: int) -> float:
df = float(df)
if not (df > p - 1):
raise ValueError(f"df must be > p-1 (p={p}), got df={df}")
return df
def wishart_logpdf_numpy(W: np.ndarray, df: float, scale: np.ndarray) -> float:
'''Log-PDF of Wishart_p(df, scale) at a single SPD matrix W.'''
scale = validate_scale(scale)
p = scale.shape[0]
df = validate_df(df, p)
W = np.asarray(W, dtype=float)
if W.shape != (p, p):
raise ValueError(f"W must have shape {(p, p)}, got {W.shape}")
if not is_symmetric(W):
raise ValueError("W must be symmetric")
# W must be SPD for the density to be finite.
_ = cholesky_spd(W)
sign_w, logdet_w = np.linalg.slogdet(W)
sign_s, logdet_s = np.linalg.slogdet(scale)
if sign_w <= 0 or sign_s <= 0:
return -np.inf
trace_term = np.trace(np.linalg.solve(scale, W))
log_norm = (df * p / 2) * np.log(2.0) + (df / 2) * logdet_s + multigammaln(df / 2, p)
return 0.5 * (df - p - 1) * logdet_w - 0.5 * trace_term - log_norm
def wishart_pdf_numpy(W: np.ndarray, df: float, scale: np.ndarray) -> float:
return float(np.exp(wishart_logpdf_numpy(W, df, scale)))
def wishart_mean(df: float, scale: np.ndarray) -> np.ndarray:
scale = validate_scale(scale)
df = validate_df(df, scale.shape[0])
return df * scale
def wishart_cov_entry(df: float, scale: np.ndarray, i: int, j: int, k: int, l: int) -> float:
'''Cov(W_ij, W_kl) for W ~ Wishart_p(df, scale).'''
scale = validate_scale(scale)
p = scale.shape[0]
df = validate_df(df, p)
for idx in (i, j, k, l):
if not (0 <= idx < p):
raise IndexError("index out of bounds")
return df * (scale[i, k] * scale[j, l] + scale[i, l] * scale[j, k])
def wishart_var_matrix(df: float, scale: np.ndarray) -> np.ndarray:
'''Elementwise variances Var(W_ij).'''
scale = validate_scale(scale)
p = scale.shape[0]
df = validate_df(df, p)
out = np.empty((p, p), dtype=float)
for i in range(p):
for j in range(p):
out[i, j] = wishart_cov_entry(df, scale, i, j, i, j)
return out
def wishart_expected_logdet(df: float, scale: np.ndarray) -> float:
'''E[log |W|] for W ~ Wishart_p(df, scale).'''
scale = validate_scale(scale)
p = scale.shape[0]
df = validate_df(df, p)
sign_s, logdet_s = np.linalg.slogdet(scale)
if sign_s <= 0:
return np.nan
terms = psi((df + 1 - np.arange(1, p + 1)) / 2)
return float(logdet_s + p * np.log(2.0) + np.sum(terms))
def wishart_entropy(df: float, scale: np.ndarray) -> float:
'''Differential entropy H(W) for W ~ Wishart_p(df, scale).'''
scale = validate_scale(scale)
p = scale.shape[0]
df = validate_df(df, p)
_, logdet_s = np.linalg.slogdet(scale)
digamma_sum = float(np.sum(psi((df + 1 - np.arange(1, p + 1)) / 2)))
return float(
(df * p / 2)
+ multigammaln(df / 2, p)
+ ((p + 1) / 2) * logdet_s
+ (p * (p + 1) / 2) * np.log(2.0)
- ((df - p - 1) / 2) * digamma_sum
)
def wishart_mgf(T: np.ndarray, df: float, scale: np.ndarray) -> float:
'''Matrix MGF: M(T)=E[exp(tr(TW))].
Domain: requires I - 2 * scale^(1/2) * T * scale^(1/2) to be SPD.
'''
scale = validate_scale(scale)
p = scale.shape[0]
df = validate_df(df, p)
T = np.asarray(T, dtype=float)
if T.shape != (p, p) or not is_symmetric(T):
raise ValueError(f"T must be a symmetric {(p,p)} matrix")
# Use a symmetric similarity transform for numerical stability.
L = np.linalg.cholesky(scale)
A = np.eye(p) - 2.0 * (L @ T @ L.T)
A = 0.5 * (A + A.T)
_ = cholesky_spd(A) # domain check
sign, logdet = np.linalg.slogdet(A)
if sign <= 0:
return np.nan
return float(np.exp(-(df / 2) * logdet))
4) Moments & Properties#
Let (W \sim \mathrm{Wishart}_p(\nu,\Sigma)).
4.1 Mean#
[ \mathbb{E}[W] = \nu,\Sigma. ]
4.2 Covariance structure (entrywise)#
For indices (i,j,k,\ell\in{1,\dots,p}), [ \mathrm{Cov}(W_{ij}, W_{k\ell}) = \nu,\big(\Sigma_{ik}\Sigma_{j\ell} + \Sigma_{i\ell}\Sigma_{jk}\big). ] In particular, [ \mathrm{Var}(W_{ij}) = \nu,(\Sigma_{ii}\Sigma_{jj}+\Sigma_{ij}^2). ]
4.3 Useful scalar marginals#
Even though (W) is matrix-valued, some scalar pieces are simple:
Diagonal entries: for every (i), [ \frac{W_{ii}}{\Sigma_{ii}} \sim \chi^2_{\nu}. ] So
(\mathbb{E}[W_{ii}] = \nu\Sigma_{ii})
(\mathrm{Var}(W_{ii}) = 2\nu\Sigma_{ii}^2)
skewness (=\sqrt{8/\nu})
excess kurtosis (=12/\nu)
Trace (special case): if (\Sigma=I), then [ \mathrm{tr}(W) \sim \chi^2_{\nu p} ] because it is the sum of (\nu p) independent standard-normal squares.
4.4 MGF / characteristic function (matrix argument)#
For a symmetric matrix (T) such that (I-2T\Sigma) is SPD, [ M(T) = \mathbb{E}[\exp(\mathrm{tr}(TW))] = |I-2T\Sigma|^{-\nu/2}. ] The characteristic function is the same with (T\mapsto iT).
4.5 Mode#
If (\nu \ge p+1), the mode is [ W_{\text{mode}} = (\nu-p-1),\Sigma. ]
4.6 Entropy#
A closed form exists in terms of the multivariate gamma and digamma functions. One convenient form is [ \mathbb{E}[\log|W|] = \log|\Sigma| + p\log 2 + \sum_{i=1}^p \psi!\left(\frac{\nu+1-i}{2}\right) ] and [ H(W) = \frac{\nu p}{2} + \log\Gamma_p!\left(\frac{\nu}{2}\right)
\frac{p+1}{2}\log|\Sigma| + \frac{p(p+1)}{2}\log 2 -\frac{\nu-p-1}{2}\sum_{i=1}^p \psi!\left(\frac{\nu+1-i}{2}\right). ]
df = 8
scale = np.array([
[1.5, 0.4],
[0.4, 1.0],
])
rv = wishart(df=df, scale=scale)
W0 = rv.rvs(random_state=rng)
print("Example W:")
print(np.round(W0, 4))
print()
print("Mean (theory):")
print(wishart_mean(df, scale))
print("Mean (SciPy):")
print(rv.mean())
print()
print("Var elementwise (theory):")
print(np.round(wishart_var_matrix(df, scale), 6))
print("Var elementwise (SciPy):")
print(np.round(rv.var(), 6))
print()
print("logpdf (ours):", wishart_logpdf_numpy(W0, df, scale))
print("logpdf (SciPy):", rv.logpdf(W0))
print("entropy (ours):", wishart_entropy(df, scale))
print("entropy (SciPy):", rv.entropy())
Example W:
[[12.807 3.4193]
[ 3.4193 4.1853]]
Mean (theory):
[[12. 3.2]
[ 3.2 8. ]]
Mean (SciPy):
[[12. 3.2]
[ 3.2 8. ]]
Var elementwise (theory):
[[36. 13.28]
[13.28 16. ]]
Var elementwise (SciPy):
[[36. 13.28]
[13.28 16. ]]
logpdf (ours): -7.04274089576966
logpdf (SciPy): -7.04274089576966
entropy (ours): 8.185358204431287
entropy (SciPy): 8.185358204431287
5) Parameter Interpretation#
Degrees of freedom (\nu)#
Controls the amount of information in the scatter matrix.
In the Gaussian construction, (\nu) is literally the number of vectors being summed (a sample size).
As (\nu) increases, (W/\nu) concentrates around (\Sigma) with relative fluctuations on the order of (1/\sqrt{\nu}).
Scale matrix (\Sigma)#
Sets the mean shape: (\mathbb{E}[W]=\nu\Sigma).
Encodes directions and scales via eigen-decomposition (\Sigma = Q\Lambda Q^\top):
the eigenvectors (Q) describe preferred directions
the eigenvalues (\Lambda) describe how spread those directions are on average
Shape changes (high level)#
Increasing (\nu): distribution tightens around (\nu\Sigma) (or around (\Sigma) if you look at (W/\nu)).
Increasing overall scale (e.g., (\Sigma\mapsto c\Sigma)): multiplies (W) by (c).
Changing correlations in (\Sigma): changes how strongly entries/eigenvalues of (W) move together.
In Section 8 we’ll visualize these effects via scalar summaries (diagonal entries, eigenvalues, log-determinant).
6) Derivations#
These derivations use the Gaussian construction with integer (\nu): (x_1,\dots,x_\nu \stackrel{iid}{\sim} \mathcal{N}_p(0,\Sigma)), (W=\sum_i x_i x_i^\top). The resulting formulas extend to real (\nu>p-1).
6.1 Expectation#
[ \mathbb{E}[W] = \sum_{i=1}^{\nu} \mathbb{E}[x_i x_i^\top] = \nu,\Sigma. ]
6.2 Variance / covariance of entries#
Write (W_{ij}=\sum_{r=1}^{\nu} x_{r,i}x_{r,j}). Using independence across (r) and Isserlis’ theorem for Gaussian fourth moments, [ \mathrm{Cov}(W_{ij}, W_{k\ell}) = \nu,\mathrm{Cov}(x_{1,i}x_{1,j}, x_{1,k}x_{1,\ell}) = \nu,(\Sigma_{ik}\Sigma_{j\ell} + \Sigma_{i\ell}\Sigma_{jk}). ]
6.3 Likelihood (scale matrix (\Sigma))#
Given one observation (W), the log-likelihood (up to constants in (W)) is [ \ell(\Sigma;W) = -\frac{\nu}{2}\log|\Sigma| - \frac{1}{2}\mathrm{tr}(\Sigma^{-1}W) + \text{const}. ] Differentiating w.r.t. (\Sigma^{-1}) yields the MLE [ \widehat{\Sigma}_{\text{MLE}} = \frac{1}{\nu}W. ]
For (m) i.i.d. observations (W_1,\dots,W_m) with common (\nu), [ \widehat{\Sigma}{\text{MLE}} = \frac{1}{m\nu}\sum{t=1}^m W_t. ]
Estimating (\nu) jointly with (\Sigma) requires solving a nonlinear equation (involving digamma functions); we’ll use a simple method-of-moments estimator as a practical alternative.
def wishart_fit_scale_mle(samples: np.ndarray, df: float) -> np.ndarray:
'''MLE of scale given df for i.i.d. Wishart draws.'''
samples = np.asarray(samples, dtype=float)
if samples.ndim != 3 or samples.shape[1] != samples.shape[2]:
raise ValueError("samples must have shape (n, p, p)")
p = samples.shape[1]
df = validate_df(df, p)
return samples.mean(axis=0) / df
def wishart_fit_df_mom(samples: np.ndarray) -> tuple[float, np.ndarray]:
'''Method-of-moments df estimate using diagonal entries.
Uses: Var(W_ii) / E[W_ii]^2 = 2/df (since W_ii / Sigma_ii ~ chi^2_df).
'''
samples = np.asarray(samples, dtype=float)
if samples.ndim != 3 or samples.shape[1] != samples.shape[2]:
raise ValueError("samples must have shape (n, p, p)")
diag = np.diagonal(samples, axis1=1, axis2=2) # (n, p)
m = diag.mean(axis=0)
v = diag.var(axis=0, ddof=0)
if np.any(m <= 0):
raise ValueError("Diagonal means must be positive for MOM df estimator")
df_hats = 2.0 / (v / (m**2))
df_hat = float(np.median(df_hats))
return df_hat, df_hats
def wishart_fit_df_scale_mom(samples: np.ndarray) -> dict:
df_hat, df_hats = wishart_fit_df_mom(samples)
scale_hat = wishart_fit_scale_mle(samples, df_hat)
return {"df": df_hat, "df_by_diag": df_hats, "scale": scale_hat}
# Demo: recover df and scale from synthetic samples
p = 3
scale_true = np.array([
[1.0, 0.4, 0.2],
[0.4, 1.7, 0.1],
[0.2, 0.1, 0.9],
])
df_true = 12
rv_true = wishart(df=df_true, scale=scale_true)
samples = rv_true.rvs(size=8_000 if FAST_RUN else 40_000, random_state=rng)
fit = wishart_fit_df_scale_mom(samples)
print("df true:", df_true)
print("df hat :", round(fit["df"], 3))
print("df hats by diagonal:", np.round(fit["df_by_diag"], 3))
print()
print("scale true:")
print(np.round(scale_true, 3))
print("scale hat :")
print(np.round(fit["scale"], 3))
df true: 12
df hat : 12.132
df hats by diagonal: [12.132 11.919 12.216]
scale true:
[[1. 0.4 0.2]
[0.4 1.7 0.1]
[0.2 0.1 0.9]]
scale hat :
[[0.984 0.396 0.194]
[0.396 1.688 0.094]
[0.194 0.094 0.889]]
7) Sampling & Simulation (NumPy-only)#
The most common efficient sampler is the Bartlett decomposition.
7.1 Bartlett decomposition (idea)#
For (W \sim \mathrm{Wishart}_p(\nu, I)):
Build a lower-triangular matrix (A) such that
diagonal: (A_{ii}^2 \sim \chi^2_{\nu-i+1})
sub-diagonal: (A_{ij}\sim \mathcal{N}(0,1)) for (i>j)
above diagonal: (0)
Then [ W = AA^\top. ]
For general scale (\Sigma): if (\Sigma = LL^\top) (Cholesky), then [ W = LAA^\top L^\top. ]
This works for any real (\nu>p-1) because chi-square is defined for non-integer degrees of freedom.
7.2 Direct Gaussian construction (also NumPy-only)#
If (\nu\in\mathbb{N}), you can also sample (x_r\sim \mathcal{N}_p(0,\Sigma)) and return (\sum_r x_r x_r^\top). It is conceptually simple but can be slower for large (\nu).
def wishart_rvs_bartlett(df: float, scale: np.ndarray, *, size: int = 1, rng=None) -> np.ndarray:
'''NumPy-only sampling via Bartlett decomposition.
Returns shape (p, p) if size=1, else (size, p, p).
'''
if rng is None:
rng = np.random.default_rng()
scale = validate_scale(scale)
p = scale.shape[0]
df = validate_df(df, p)
L = np.linalg.cholesky(scale)
out = np.empty((size, p, p), dtype=float)
for s in range(size):
A = np.zeros((p, p), dtype=float)
for i in range(p):
A[i, i] = np.sqrt(rng.chisquare(df - i))
if i > 0:
A[i, :i] = rng.standard_normal(i)
LA = L @ A
W = LA @ LA.T
out[s] = W
return out[0] if size == 1 else out
def wishart_rvs_normals(df: float, scale: np.ndarray, *, size: int = 1, rng=None) -> np.ndarray:
'''NumPy-only sampling via the Gaussian construction (requires integer df).'''
if rng is None:
rng = np.random.default_rng()
df_int = int(df)
if df_int != df:
raise ValueError("Gaussian construction requires integer df")
scale = validate_scale(scale)
p = scale.shape[0]
df = validate_df(df, p)
L = np.linalg.cholesky(scale)
Z = rng.standard_normal(size=(size, df_int, p))
X = Z @ L.T # each row ~ N(0, scale)
# W = X^T X for each sample
W = np.einsum("sni,snj->sij", X, X)
return W[0] if size == 1 else W
# Quick sanity check: sampler returns SPD
scale_demo = np.array([
[1.5, 0.4],
[0.4, 1.0],
])
W_demo = wishart_rvs_bartlett(8, scale_demo, size=1, rng=rng)
_ = np.linalg.cholesky(W_demo) # should not raise
W_demo
array([[ 2.356 , -0.1647],
[-0.1647, 5.092 ]])
8) Visualization#
Because Wishart is matrix-valued, direct visualization of its full PDF/CDF is difficult for (p>2). A practical approach is to visualize:
Scalar marginals that have known forms (e.g., diagonal entries)
Scalar summaries (trace, determinant, eigenvalues)
Below we show:
A single Monte Carlo draw (W) (as a heatmap)
PDF/CDF of a diagonal entry (W_{11}) (a scaled chi-square)
Scatter of ((W_{11}, W_{22})) to show dependence
How (\nu) changes concentration of (W/\nu)
df = 8
scale = np.array([
[1.5, 0.4],
[0.4, 1.0],
])
Ws = wishart_rvs_bartlett(df, scale, size=N_SAMPLES, rng=rng)
W_one = Ws[0]
fig = px.imshow(W_one, text_auto=".3f", title="One Wishart draw W (heatmap)")
fig.update_layout(coloraxis_showscale=True)
fig.show()
w11 = Ws[:, 0, 0]
w22 = Ws[:, 1, 1]
# Skewness/kurtosis of a diagonal element (should match chi-square)
skew_theory = np.sqrt(8.0 / df)
excess_kurt_theory = 12.0 / df
skew_mc = stats.skew(w11, bias=False)
excess_kurt_mc = stats.kurtosis(w11, fisher=True, bias=False)
print("W_11 skewness theory:", round(skew_theory, 4), "MC:", round(float(skew_mc), 4))
print("W_11 excess kurtosis theory:", round(excess_kurt_theory, 4), "MC:", round(float(excess_kurt_mc), 4))
# --- PDF of W_11 ---
x = np.linspace(np.percentile(w11, 0.5), np.percentile(w11, 99.5), 400)
pdf_theory = stats.chi2.pdf(x / scale[0, 0], df=df) / scale[0, 0]
fig = go.Figure()
fig.add_trace(go.Histogram(x=w11, nbinsx=60, histnorm="probability density", name="MC (W_11)"))
fig.add_trace(go.Scatter(x=x, y=pdf_theory, mode="lines", name="Theory (scaled chi-square)"))
fig.update_layout(
title="Wishart diagonal marginal: PDF of W_11",
xaxis_title="w",
yaxis_title="density",
)
fig.show()
# --- CDF of W_11 ---
xs = np.sort(w11)
ecdf = np.arange(1, xs.size + 1) / xs.size
cdf_theory = stats.chi2.cdf(xs / scale[0, 0], df=df)
fig = go.Figure()
fig.add_trace(go.Scatter(x=xs, y=ecdf, mode="lines", name="Empirical CDF"))
fig.add_trace(go.Scatter(x=xs, y=cdf_theory, mode="lines", name="Theory CDF"))
fig.update_layout(
title="Wishart diagonal marginal: CDF of W_11",
xaxis_title="w",
yaxis_title="F(w)",
)
fig.show()
# --- Dependence between diagonal entries ---
fig = px.scatter(
x=w11,
y=w22,
opacity=0.25,
title="Monte Carlo samples: (W_11, W_22)",
labels={"x": "W_11", "y": "W_22"},
)
fig.show()
W_11 skewness theory: 1.0 MC: 0.9817
W_11 excess kurtosis theory: 1.5 MC: 1.4707
# Effect of df on concentration of W/df around scale
scale3 = np.array([
[1.0, 0.6, 0.2],
[0.6, 1.8, 0.3],
[0.2, 0.3, 0.7],
])
p = scale3.shape[0]
dfs = [p + 0.5, 10, 40]
ns = 5_000 if FAST_RUN else 25_000
rows = []
for df0 in dfs:
Ws0 = wishart_rvs_bartlett(df0, scale3, size=ns, rng=rng)
# Frobenius distance of W/df to the target scale
dist = np.linalg.norm(Ws0 / df0 - scale3, axis=(1, 2))
rows.append({"df": np.full(ns, df0), "dist": dist})
df_col = np.concatenate([r["df"] for r in rows])
dist_col = np.concatenate([r["dist"] for r in rows])
fig = px.histogram(
x=dist_col,
color=df_col.astype(str),
nbins=70,
barmode="overlay",
opacity=0.55,
title="Concentration increases with df: ||W/df - scale||_F",
labels={"x": "Frobenius distance", "color": "df"},
)
fig.show()
9) SciPy Integration (scipy.stats.wishart)#
SciPy provides a frozen Wishart distribution via:
rv = scipy.stats.wishart(df=nu, scale=Sigma)
Available methods (SciPy 1.15):
pdf,logpdfrvsmean,var,mode,entropy
Notably missing (as of SciPy 1.15):
cdf(matrix CDF is nontrivial)fit(no built-in MLE)
Workarounds:
For CDF-like quantities, use scalar marginals (e.g., diagonal entries) or Monte Carlo estimates of (\mathbb{P}(W\preceq X)).
For fitting, use the closed-form MLE for (\Sigma) given (\nu), or a method-of-moments / numerical MLE for (\nu).
df = 8
scale = np.array([
[1.5, 0.4],
[0.4, 1.0],
])
rv = wishart(df=df, scale=scale)
W = rv.rvs(random_state=rng)
print("W:")
print(np.round(W, 4))
print("pdf:", rv.pdf(W))
print("logpdf:", rv.logpdf(W))
print("mean:")
print(rv.mean())
print("var (elementwise):")
print(np.round(rv.var(), 6))
print("mode:")
print(np.round(rv.mode(), 4))
print("entropy:", rv.entropy())
print()
print("has cdf?", hasattr(rv, "cdf"))
print("has fit?", hasattr(rv, "fit"))
# A "CDF" we *can* compute: diagonal marginal (scaled chi-square)
w = float(W[0, 0])
print()
print("P(W_11 <= w) via chi-square:", stats.chi2.cdf(w / scale[0, 0], df=df))
W:
[[14.1372 1.9336]
[ 1.9336 1.8141]]
pdf: 0.0002542559011445013
logpdf: -8.277169313297081
mean:
[[12. 3.2]
[ 3.2 8. ]]
var (elementwise):
[[36. 13.28]
[13.28 16. ]]
mode:
[[7.5 2. ]
[2. 5. ]]
entropy: 8.185358204431287
has cdf? False
has fit? False
P(W_11 <= w) via chi-square: 0.6922652617053153
# Monte Carlo estimate of the matrix CDF: P(W \preceq X)
scale = np.array([
[1.2, 0.3],
[0.3, 0.9],
])
df = 9
# Pick a threshold X (SPD). For example, a multiple of the mean.
X = 1.15 * wishart_mean(df, scale)
Ws = wishart_rvs_bartlett(df, scale, size=30_000 if FAST_RUN else 150_000, rng=rng)
# W \preceq X <=> X - W is PSD. For 2x2, PSD can be checked by eigenvalues.
# (For larger p you would use a Cholesky attempt on X-W, but numerical noise matters.)
mats = X[None, :, :] - Ws
# numerical tolerance: allow tiny negative eigenvalues from floating-point error
lmin = np.linalg.eigvalsh(mats)[:, 0]
prob_hat = np.mean(lmin >= -1e-10)
print("X:")
print(np.round(X, 3))
print(r"Monte Carlo P(W \preceq X) ≈", prob_hat)
X:
[[12.42 3.105]
[ 3.105 9.315]]
Monte Carlo P(W \preceq X) ≈ 0.3414
10) Statistical Use Cases#
10.1 Hypothesis testing#
If (x_1,\dots,x_\nu\sim\mathcal{N}_p(0,\Sigma)), then the scatter matrix (W=\sum_i x_i x_i^\top) is Wishart. This gives exact sampling distributions for statistics built from (W).
A simple example uses whitening under a null (\Sigma_0): [ \Sigma_0^{-1/2} W, \Sigma_0^{-1/2} \sim \mathrm{Wishart}p(\nu, I) ] when (\Sigma=\Sigma_0). Then (\mathrm{tr}(\Sigma_0^{-1} W)\sim\chi^2{\nu p}) provides an exact test of total variance.
10.2 Bayesian modeling#
Wishart is conjugate to the precision matrix (\Lambda=\Sigma^{-1}) in a multivariate normal model. With known mean (say 0):
Prior: (\Lambda \sim \mathrm{Wishart}_p(\nu_0, S_0))
Data: (x_i\mid\Lambda \sim \mathcal{N}_p(0,\Lambda^{-1}))
Let (S=\sum_i x_i x_i^\top). Posterior: [ \Lambda\mid x_{1:n} \sim \mathrm{Wishart}_p\big(\nu_0+n,; (S_0^{-1}+S)^{-1}\big). ]
10.3 Generative modeling#
Wishart gives a principled way to generate random SPD matrices that can be used as:
random covariance / precision matrices
random kernels for Gaussian processes
synthetic correlation structures for simulation studies
# 10.1 Hypothesis test example via whitening + trace statistic
p = 3
nu = 15
Sigma0 = np.array([
[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0],
])
# Create data under an alternative (not equal to Sigma0)
Sigma_true = np.array([
[1.0, 0.6, 0.2],
[0.6, 1.5, 0.1],
[0.2, 0.1, 0.7],
])
X = rng.multivariate_normal(mean=np.zeros(p), cov=Sigma_true, size=nu)
W = X.T @ X
# Under H0: Sigma=Sigma0, the whitened scatter is Wishart(nu, I)
# and tr(Sigma0^{-1} W) = tr(W) ~ chi2_{nu*p}.
T = float(np.trace(np.linalg.solve(Sigma0, W)))
p_value = 1.0 - stats.chi2.cdf(T, df=nu * p)
print("Trace test statistic T=tr(Sigma0^{-1} W):", round(T, 4))
print("p-value (right tail):", p_value)
# Note: this test is sensitive to overall variance (trace), not all aspects of covariance.
Trace test statistic T=tr(Sigma0^{-1} W): 30.0502
p-value (right tail): 0.9574804185486624
# 10.2 Bayesian update: Wishart prior on precision
p = 2
n = 30
Sigma_true = np.array([
[1.0, 0.7],
[0.7, 1.8],
])
# Data model: x ~ N(0, Sigma_true)
X = rng.multivariate_normal(mean=np.zeros(p), cov=Sigma_true, size=n)
S = X.T @ X
# Prior on precision Lambda = Sigma^{-1}
nu0 = p + 2 # must be > p-1
S0 = np.eye(p) # prior scale (on precision)
nu_post = nu0 + n
S_post = np.linalg.inv(np.linalg.inv(S0) + S)
post = wishart(df=nu_post, scale=S_post)
Lambda_samples = post.rvs(size=3_000 if FAST_RUN else 15_000, random_state=rng)
# Posterior mean of precision: E[Lambda] = nu_post * S_post
Lambda_mean = nu_post * S_post
# Convert to covariance by inversion (Monte Carlo)
Sigma_samples = np.linalg.inv(Lambda_samples)
Sigma_mean_mc = Sigma_samples.mean(axis=0)
print("Posterior df:", nu_post)
print("Posterior scale (precision):")
print(np.round(S_post, 4))
print()
print("E[Lambda|data] (closed form):")
print(np.round(Lambda_mean, 4))
print("E[Sigma|data] (MC via inversion):")
print(np.round(Sigma_mean_mc, 4))
Posterior df: 34
Posterior scale (precision):
[[ 0.0314 -0.0074]
[-0.0074 0.0235]]
E[Lambda|data] (closed form):
[[ 1.0692 -0.2528]
[-0.2528 0.7982]]
E[Sigma|data] (MC via inversion):
[[1.1047 0.3462]
[0.3462 1.4882]]
# 10.3 Generative modeling: random covariance matrices
p = 2
nu = 25
Sigma_target = np.array([
[1.0, 0.5],
[0.5, 1.3],
])
# Sample a random scatter matrix with mean nu * Sigma_target
W = wishart_rvs_bartlett(nu, Sigma_target, size=1, rng=rng)
# Turn it into a random covariance estimate with mean Sigma_target
Sigma_sample = W / nu
Y = rng.multivariate_normal(mean=np.zeros(p), cov=Sigma_sample, size=300)
print("Sigma target:")
print(np.round(Sigma_target, 3))
print("Sigma sampled:")
print(np.round(Sigma_sample, 3))
fig = px.scatter(
x=Y[:, 0],
y=Y[:, 1],
opacity=0.6,
title="Data generated from a random covariance (Sigma_sample)",
labels={"x": "y1", "y": "y2"},
)
fig.show()
Sigma target:
[[1. 0.5]
[0.5 1.3]]
Sigma sampled:
[[0.982 0.762]
[0.762 1.08 ]]
11) Pitfalls#
Parameter constraints:
(\nu > p-1) is required for a proper density.
(\Sigma) must be symmetric positive definite (not just semidefinite).
SPD checks are numerical: floating-point symmetry / PSD checks need tolerances.
Determinants and inverses can overflow/underflow:
prefer
logpdfoverpdfuse
slogdetand solves (np.linalg.solve) instead of explicit inverses
CDF is hard: for (p>1), the matrix CDF is not available in SciPy; use scalar functionals or Monte Carlo.
Inverse-Wishart is often what you want: many Bayesian covariance models place a prior on (\Sigma), not on (\Sigma^{-1}).
Interpretation: (\nu) scales the mean; looking at (W/\nu) is often more interpretable as a noisy estimate of (\Sigma).
12) Summary#
Wishart is a continuous distribution on SPD matrices, parameterized by degrees of freedom (\nu) and scale (\Sigma).
It is the sampling distribution of Gaussian scatter matrices: (W=\sum_i x_i x_i^\top).
Key formulas:
(\mathbb{E}[W]=\nu\Sigma)
(\mathrm{Cov}(W_{ij},W_{k\ell})=\nu(\Sigma_{ik}\Sigma_{j\ell}+\Sigma_{i\ell}\Sigma_{jk}))
diagonal entries are scaled chi-square
matrix MGF: (|I-2T\Sigma|^{-\nu/2})
For simulation, Bartlett decomposition provides an efficient NumPy-only sampler.
In SciPy,
scipy.stats.wishartsupportspdf/logpdf,rvs,mean/var/mode/entropy, but notcdforfit.